Data Reading and Processing: In this step,the dataset provided was read.Timestamp column were arranged as the index.To fill the NaN values fillna operation of pandas were implemented.Methods of backfill and forwardfill were used in order not to leave any empty cells.MinMax scaling were implemented in order to scale the data to same interval for every respective column.First five values of the original dataset are provided.
#libraries used and data reading
import pandas as pd
import numpy as np
import datetime
import time as tm
import pytz
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import adfuller
from sklearn.preprocessing import MinMaxScaler
sns.set()
#Reading the data and filling nan values
df_read=pd.read_csv("C:/Users/kteke/OneDrive/Desktop/all_ticks_wide.csv")
df_original=df_read
df_read.timestamp = pd.to_datetime(df_read.timestamp)
df_read.set_index('timestamp', inplace=True)
df_read=df_read.fillna(method='bfill')
df_read=df_read.fillna(method='ffill')
scaler=MinMaxScaler()
scaler.fit(df_read.values)
scaled_data=scaler.transform(df_read.values)
df=df_read
df.iloc[:,:]=scaled_data
df_original.head()
| AEFES | AKBNK | AKSA | AKSEN | ALARK | ALBRK | ANACM | ARCLK | ASELS | ASUZU | ... | TTKOM | TUKAS | TUPRS | USAK | VAKBN | VESTL | YATAS | YKBNK | YUNSA | ZOREN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| timestamp | |||||||||||||||||||||
| 2012-09-17 06:45:00+00:00 | 22.3978 | 5.2084 | 1.7102 | 3.87 | 1.4683 | 1.1356 | 1.0634 | 6.9909 | 2.9948 | 2.4998 | ... | 4.2639 | 0.96 | 29.8072 | 1.0382 | 3.8620 | 1.90 | 0.4172 | 2.5438 | 2.2619 | 0.7789 |
| 2012-09-17 07:00:00+00:00 | 22.3978 | 5.1938 | 1.7066 | 3.86 | 1.4574 | 1.1275 | 1.0634 | 6.9259 | 2.9948 | 2.5100 | ... | 4.2521 | 0.96 | 29.7393 | 1.0382 | 3.8529 | 1.90 | 0.4229 | 2.5266 | 2.2462 | 0.7789 |
| 2012-09-17 07:15:00+00:00 | 22.3978 | 5.2084 | 1.7102 | NaN | 1.4610 | 1.1356 | 1.0679 | 6.9909 | 2.9855 | 2.4796 | ... | 4.2521 | 0.97 | 29.6716 | 1.0463 | 3.8436 | 1.91 | 0.4229 | 2.5266 | 2.2566 | 0.7789 |
| 2012-09-17 07:30:00+00:00 | 22.3978 | 5.1938 | 1.7102 | 3.86 | 1.4537 | 1.1275 | 1.0679 | 6.9584 | 2.9855 | 2.4897 | ... | 4.2521 | 0.97 | 29.7393 | 1.0382 | 3.8529 | 1.91 | 0.4286 | 2.5324 | 2.2619 | 0.7860 |
| 2012-09-17 07:45:00+00:00 | 22.5649 | 5.2084 | 1.7102 | 3.87 | 1.4574 | 1.1356 | 1.0725 | 6.9909 | 2.9760 | 2.4897 | ... | 4.2521 | 0.97 | 29.8072 | 1.0382 | 3.8620 | 1.90 | 0.4286 | 2.5324 | 2.2619 | 0.7789 |
5 rows × 60 columns
Data Visualization: Obtained data were plotted with matplotlibs plot() function.As the number of variables are 60 ,to better illustrate the plots 20 plots with 3 different stock data within each were constructed.As it can be clearly seen from the plots,there is trend in nearly all of the variables which will affect how the data analysis will be conducted.
#Visualize data
for i in range(20):
df.iloc[:,i*3:(i+1)*3].plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('date', fontsize=20);
Density Plot for the Dataset: For every variable a density plot were constructed.These plots were constructed only for visualization purposes.As there is trend within data ,meaningful results cannot be obtained from them.
for j in range(0,6):
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df.columns[j*10:(j+1)*10]):
sns.kdeplot(df[col], ax=ax[i])
ax[i].set_title(col)
fig.tight_layout(w_pad=6, h_pad=4)
plt.show()
Density Plot for the Dataset(First Difference): To detrend the data first difference was taken and corresonding density plots of the variables are provided below.Visual inspection proposes that first difference of a given variable follows a Laplace distribution.However two density plots suggests the underlying distribution is multimodal.These are prime candidates for correlation analysis.Multimodal distribution suggests multiple stochastic processes as the underlying source so in correlation analysis there could be combination of different stochastic processes coinciding with each other which in theory would create different correlation patterns in different time intervals.'TUKAS' and 'ALBRK are the stocks that yield multinomal distribution
df_dif=df.diff()
df_dif=df_dif.fillna(method='bfill')
df_dif_num=df_dif
for j in range(0,10):
fig, axes = plt.subplots(3,2, figsize=(30, 10))
ax = axes.flatten()
for i, col in enumerate(df_dif_num.columns[j*6:(j+1)*6]):
sns.kdeplot(df_dif_num[col], ax=ax[i])
ax[i].set_title(col)
plt.show()
Histogram of Negative and Positive values: A histogram depicting number of positive,negative and nill values was constructed for each variable.Discrepencies between positive and negative differences were searched for.No valuable visual evidence that suggests there is any stocks that favors a certain direction was obtained.
df_dif[df_dif > 0] = 1
df_dif[df_dif < 0] = -1
df_dif[df_dif == 0] = 0
for j in range(0,6):
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df_dif.columns[j*10:(j+1)*10]):
sns.histplot(df_dif[col], ax=ax[i])
ax[i].set_title(col)
#fig.tight_layout(w_pad=6, h_pad=4)
plt.show()
Plots for first difference and Second difference: First and second difference plots for the variables was constructed.As it can be seen, there is a steep spike in difference between 2013 and 2014.This date is to be determined later and used in google trend analysis.Also higher vaolatility of prices can be observed after 2017.
for i in range(20):
df.diff().iloc[:,i*3:(i+1)*3].plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('date', fontsize=20)
df_dif_2=df.diff().diff()
df_dif_2=df_dif_2.fillna(method='bfill')
for i in range(20):
df.diff().diff().iloc[:,i*3:(i+1)*3].plot(figsize=(20,10), linewidth=0.5, fontsize=20)
plt.xlabel('date', fontsize=20)
Stationary Tests: Augmented Dickey Fuller Test was implemented for first difference,second difference and raw data.Results are provided below.If our p value is smaller than 0.05 we reject the null hypothesis which means our series are stationary H0:The series in question are non stationary H1:The series in question are stationary The results of the ADF test suggests even under second difference stationarity assumption is under jeopardy.This puts all the analysis obtained in this hw in a vague position.
#Statinary test for first differences
from statsmodels.tsa.stattools import adfuller
first_difference_stationary_test=[]
for i in range(df_dif.values.shape[1]):
result=adfuller(df_dif.values[i])
first_difference_stationary_test.append(result)
print('Column no: %f' %i)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
Column no: 0.000000 ADF Statistic: -8.099270 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 1.000000 ADF Statistic: -8.099270 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 2.000000 ADF Statistic: -4.216730 p-value: 0.000617 Critical Values: 1%: -3.560 5%: -2.918 10%: -2.597 Column no: 3.000000 ADF Statistic: -4.077874 p-value: 0.001053 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 4.000000 ADF Statistic: -5.995265 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 5.000000 ADF Statistic: -8.936146 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 6.000000 ADF Statistic: -3.319504 p-value: 0.014029 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column no: 7.000000 ADF Statistic: -6.152137 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 8.000000 ADF Statistic: -9.456608 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 9.000000 ADF Statistic: -8.314044 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 10.000000 ADF Statistic: -8.523931 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 11.000000 ADF Statistic: -8.244489 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 12.000000 ADF Statistic: -7.300131 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 13.000000 ADF Statistic: -1.769749 p-value: 0.395580 Critical Values: 1%: -3.575 5%: -2.924 10%: -2.600 Column no: 14.000000 ADF Statistic: -6.793439 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 15.000000 ADF Statistic: -6.826620 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 16.000000 ADF Statistic: -1.511662 p-value: 0.527735 Critical Values: 1%: -3.575 5%: -2.924 10%: -2.600 Column no: 17.000000 ADF Statistic: -7.831539 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 18.000000 ADF Statistic: -9.016590 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 19.000000 ADF Statistic: -4.174619 p-value: 0.000727 Critical Values: 1%: -3.568 5%: -2.921 10%: -2.599 Column no: 20.000000 ADF Statistic: -6.712533 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 21.000000 ADF Statistic: -7.684404 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 22.000000 ADF Statistic: -6.917474 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 23.000000 ADF Statistic: -3.603114 p-value: 0.005702 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 24.000000 ADF Statistic: -10.687146 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 25.000000 ADF Statistic: -7.601218 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 26.000000 ADF Statistic: -9.366595 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 27.000000 ADF Statistic: -7.434416 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 28.000000 ADF Statistic: -7.008647 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 29.000000 ADF Statistic: -5.593441 p-value: 0.000001 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column no: 30.000000 ADF Statistic: -7.245293 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 31.000000 ADF Statistic: -6.770616 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 32.000000 ADF Statistic: -6.574993 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 33.000000 ADF Statistic: -8.880827 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 34.000000 ADF Statistic: -4.176622 p-value: 0.000722 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 35.000000 ADF Statistic: -9.334043 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 36.000000 ADF Statistic: -8.918422 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 37.000000 ADF Statistic: -6.661370 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 38.000000 ADF Statistic: -5.499863 p-value: 0.000002 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column no: 39.000000 ADF Statistic: -7.318254 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 40.000000 ADF Statistic: -1.919012 p-value: 0.323171 Critical Values: 1%: -3.563 5%: -2.919 10%: -2.597 Column no: 41.000000 ADF Statistic: -8.188024 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 42.000000 ADF Statistic: -8.234837 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 43.000000 ADF Statistic: -4.779133 p-value: 0.000060 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 44.000000 ADF Statistic: -8.717417 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 45.000000 ADF Statistic: -8.248489 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 46.000000 ADF Statistic: -4.014292 p-value: 0.001337 Critical Values: 1%: -3.558 5%: -2.917 10%: -2.596 Column no: 47.000000 ADF Statistic: -2.285509 p-value: 0.176684 Critical Values: 1%: -3.555 5%: -2.916 10%: -2.596 Column no: 48.000000 ADF Statistic: -8.361810 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 49.000000 ADF Statistic: -7.769269 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 50.000000 ADF Statistic: -7.310824 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 51.000000 ADF Statistic: -5.472582 p-value: 0.000002 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 52.000000 ADF Statistic: -8.695394 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 53.000000 ADF Statistic: -8.750652 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 54.000000 ADF Statistic: -7.443134 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 55.000000 ADF Statistic: -8.205769 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 56.000000 ADF Statistic: -2.824305 p-value: 0.054885 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column no: 57.000000 ADF Statistic: -3.037337 p-value: 0.031552 Critical Values: 1%: -3.555 5%: -2.916 10%: -2.596 Column no: 58.000000 ADF Statistic: -7.081131 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column no: 59.000000 ADF Statistic: -7.180547 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594
#Statinary test for second differences
from statsmodels.tsa.stattools import adfuller
second_difference_stationary_test=[]
for i in range(df_dif_2.shape[1]):
result=adfuller(df_dif_2.values[i])
second_difference_stationary_test.append(result)
print('Column2 no: %f' %i)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
Column2 no: 0.000000 ADF Statistic: -7.500367 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 1.000000 ADF Statistic: -7.500367 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 2.000000 ADF Statistic: -7.500367 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 3.000000 ADF Statistic: -3.651316 p-value: 0.004852 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column2 no: 4.000000 ADF Statistic: -5.505116 p-value: 0.000002 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 5.000000 ADF Statistic: -7.626145 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 6.000000 ADF Statistic: -9.723641 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 7.000000 ADF Statistic: -5.739843 p-value: 0.000001 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 8.000000 ADF Statistic: -7.857847 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 9.000000 ADF Statistic: -8.194446 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 10.000000 ADF Statistic: -9.136396 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 11.000000 ADF Statistic: -5.999902 p-value: 0.000000 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column2 no: 12.000000 ADF Statistic: -9.886147 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 13.000000 ADF Statistic: -5.778694 p-value: 0.000001 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column2 no: 14.000000 ADF Statistic: -6.969938 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 15.000000 ADF Statistic: -7.708578 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 16.000000 ADF Statistic: -1.458240 p-value: 0.554059 Critical Values: 1%: -3.571 5%: -2.923 10%: -2.599 Column2 no: 17.000000 ADF Statistic: -2.727714 p-value: 0.069364 Critical Values: 1%: -3.566 5%: -2.920 10%: -2.598 Column2 no: 18.000000 ADF Statistic: -8.394921 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 19.000000 ADF Statistic: -7.917404 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 20.000000 ADF Statistic: -6.867701 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 21.000000 ADF Statistic: -8.623271 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 22.000000 ADF Statistic: -7.760141 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 23.000000 ADF Statistic: -7.131561 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 24.000000 ADF Statistic: -3.888958 p-value: 0.002118 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column2 no: 25.000000 ADF Statistic: -11.458508 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 26.000000 ADF Statistic: -10.361013 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 27.000000 ADF Statistic: -7.690150 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 28.000000 ADF Statistic: -8.239476 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 29.000000 ADF Statistic: -4.167237 p-value: 0.000748 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column2 no: 30.000000 ADF Statistic: -4.880950 p-value: 0.000038 Critical Values: 1%: -3.555 5%: -2.916 10%: -2.596 Column2 no: 31.000000 ADF Statistic: -4.816709 p-value: 0.000051 Critical Values: 1%: -3.553 5%: -2.915 10%: -2.595 Column2 no: 32.000000 ADF Statistic: -6.927399 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 33.000000 ADF Statistic: -7.618179 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 34.000000 ADF Statistic: -4.672030 p-value: 0.000095 Critical Values: 1%: -3.575 5%: -2.924 10%: -2.600 Column2 no: 35.000000 ADF Statistic: -8.484573 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 36.000000 ADF Statistic: -9.206054 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 37.000000 ADF Statistic: -8.877339 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 38.000000 ADF Statistic: -2.696517 p-value: 0.074643 Critical Values: 1%: -3.558 5%: -2.917 10%: -2.596 Column2 no: 39.000000 ADF Statistic: -6.197472 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column2 no: 40.000000 ADF Statistic: -6.154092 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column2 no: 41.000000 ADF Statistic: -4.485382 p-value: 0.000209 Critical Values: 1%: -3.563 5%: -2.919 10%: -2.597 Column2 no: 42.000000 ADF Statistic: -7.575952 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 43.000000 ADF Statistic: -4.154510 p-value: 0.000786 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column2 no: 44.000000 ADF Statistic: -9.957188 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 45.000000 ADF Statistic: -3.569568 p-value: 0.006370 Critical Values: 1%: -3.555 5%: -2.916 10%: -2.596 Column2 no: 46.000000 ADF Statistic: -6.972989 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 47.000000 ADF Statistic: -2.108435 p-value: 0.241136 Critical Values: 1%: -3.555 5%: -2.916 10%: -2.596 Column2 no: 48.000000 ADF Statistic: -4.754949 p-value: 0.000066 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column2 no: 49.000000 ADF Statistic: -8.947385 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 50.000000 ADF Statistic: -7.845440 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 51.000000 ADF Statistic: -6.035620 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 52.000000 ADF Statistic: -7.814204 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 53.000000 ADF Statistic: -8.281382 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 54.000000 ADF Statistic: -3.385010 p-value: 0.011481 Critical Values: 1%: -3.553 5%: -2.915 10%: -2.595 Column2 no: 55.000000 ADF Statistic: -8.184586 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 56.000000 ADF Statistic: -7.816258 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 57.000000 ADF Statistic: -8.907242 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594 Column2 no: 58.000000 ADF Statistic: -4.377272 p-value: 0.000326 Critical Values: 1%: -3.551 5%: -2.914 10%: -2.595 Column2 no: 59.000000 ADF Statistic: -6.602987 p-value: 0.000000 Critical Values: 1%: -3.546 5%: -2.912 10%: -2.594
#Statinary test for data
from statsmodels.tsa.stattools import adfuller
stationary_test=[]
for i in range(df.values.shape[1]):
result=adfuller(df.values[i])
stationary_test.append(result)
print('Column no: %f' %i)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
Column no: 0.000000 ADF Statistic: -7.479514 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 1.000000 ADF Statistic: -7.484154 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 2.000000 ADF Statistic: -7.482057 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 3.000000 ADF Statistic: -7.487830 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 4.000000 ADF Statistic: -7.490618 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 5.000000 ADF Statistic: -7.498728 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 6.000000 ADF Statistic: -7.501188 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 7.000000 ADF Statistic: -7.497547 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 8.000000 ADF Statistic: -7.505938 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 9.000000 ADF Statistic: -7.501499 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 10.000000 ADF Statistic: -7.487867 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 11.000000 ADF Statistic: -7.481673 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 12.000000 ADF Statistic: -7.475956 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 13.000000 ADF Statistic: -7.492711 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 14.000000 ADF Statistic: -7.483697 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 15.000000 ADF Statistic: -7.489679 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 16.000000 ADF Statistic: -7.504898 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 17.000000 ADF Statistic: -7.494967 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 18.000000 ADF Statistic: -7.491407 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 19.000000 ADF Statistic: -7.501281 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 20.000000 ADF Statistic: -7.487235 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 21.000000 ADF Statistic: -7.488775 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 22.000000 ADF Statistic: -7.475103 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 23.000000 ADF Statistic: -7.478036 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 24.000000 ADF Statistic: -7.472849 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 25.000000 ADF Statistic: -7.471032 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 26.000000 ADF Statistic: -7.476766 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 27.000000 ADF Statistic: -7.488429 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 28.000000 ADF Statistic: -7.473660 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 29.000000 ADF Statistic: -7.480347 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 30.000000 ADF Statistic: -7.471111 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 31.000000 ADF Statistic: -7.491417 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 32.000000 ADF Statistic: -7.484816 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 33.000000 ADF Statistic: -7.484799 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 34.000000 ADF Statistic: -7.479580 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 35.000000 ADF Statistic: -7.479928 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 36.000000 ADF Statistic: -7.487338 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 37.000000 ADF Statistic: -7.473418 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 38.000000 ADF Statistic: -7.474487 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 39.000000 ADF Statistic: -7.463117 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 40.000000 ADF Statistic: -7.468418 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 41.000000 ADF Statistic: -7.485016 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 42.000000 ADF Statistic: -7.479275 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 43.000000 ADF Statistic: -7.469214 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 44.000000 ADF Statistic: -7.474453 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 45.000000 ADF Statistic: -7.472985 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 46.000000 ADF Statistic: -7.464098 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 47.000000 ADF Statistic: -7.460705 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 48.000000 ADF Statistic: -7.462293 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 49.000000 ADF Statistic: -7.459727 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 50.000000 ADF Statistic: -7.458120 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 51.000000 ADF Statistic: -7.463083 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 52.000000 ADF Statistic: -7.467293 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 53.000000 ADF Statistic: -7.456245 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 54.000000 ADF Statistic: -7.465805 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 55.000000 ADF Statistic: -7.470211 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 56.000000 ADF Statistic: -7.467319 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 57.000000 ADF Statistic: -7.467453 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 58.000000 ADF Statistic: -7.474889 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594 Column no: 59.000000 ADF Statistic: -7.464989 p-value: 0.000000 Critical Values: 1%: -3.548 5%: -2.913 10%: -2.594
Autocorrelation Test: Autocorrelation were tested for the dataset.No Autocorrelation was detected.Corresponding charts are provide below.
#Autocorrelation
import statsmodels as sm
from statsmodels.graphics import tsaplots
auto_test=pd.DataFrame(data=np.zeros((6,60)),columns=df.columns)
for i in range(df.values.shape[1]):
result=sm.tsa.stattools.acf(df.values[i],nlags=5)
auto_test[df.columns[i]]=result
for j in range(0,6):
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(auto_test.columns[j*10:(j+1)*10]):
sm.graphics.tsaplots.plot_acf(auto_test[col], ax=ax[i])
ax[i].set_title(col)
fig.tight_layout(w_pad=6, h_pad=4)
plt.show()
#Autocorrelation difference 1
import statsmodels as sm
from statsmodels.graphics import tsaplots
auto_test1=pd.DataFrame(data=np.zeros((6,60)),columns=df.columns)
for i in range(df.values.shape[1]):
result=sm.tsa.stattools.acf(df_dif.values[i],nlags=5)
auto_test1[df.columns[i]]=result
for j in range(0,6):
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(auto_test.columns[j*10:(j+1)*10]):
sm.graphics.tsaplots.plot_acf(auto_test1[col], ax=ax[i])
ax[i].set_title(col)
fig.tight_layout(w_pad=6, h_pad=4)
plt.show()
Summary Statistics for First difference: Summary statistics for first difference,positive values of first difference and negative values of the first difference are provided.No stock of unsually properties were observed(mean higher and lower than 0).All of the stocks seems to have similar variance and mean 0.Skewness,kurtosis,mode,median,standard deviation and variance for every stock was measured.No seems to stick out.For the correlation analysis stock with highest mean will be used.('ISYAT')
#Summary statistics for first difference
data=df.diff()
statistics_positive=data[data>0]
statistics_positive.describe()
| AEFES | AKBNK | AKSA | AKSEN | ALARK | ALBRK | ANACM | ARCLK | ASELS | ASUZU | ... | TTKOM | TUKAS | TUPRS | USAK | VAKBN | VESTL | YATAS | YKBNK | YUNSA | ZOREN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 18144.000000 | 18594.000000 | 17442.000000 | 14070.000000 | 14636.000000 | 11341.000000 | 13162.000000 | 17944.000000 | 17194.000000 | 16983.000000 | ... | 16656.000000 | 12119.000000 | 19413.000000 | 11599.000000 | 16946.000000 | 16303.000000 | 14887.000000 | 15517.000000 | 14877.000000 | 12066.000000 |
| mean | 0.002707 | 0.002222 | 0.001703 | 0.002982 | 0.002516 | 0.009556 | 0.002669 | 0.001989 | 0.000935 | 0.001958 | ... | 0.002721 | 0.002809 | 0.001250 | 0.003858 | 0.002441 | 0.001991 | 0.001331 | 0.002802 | 0.002395 | 0.004323 |
| std | 0.007720 | 0.006448 | 0.002719 | 0.008132 | 0.005679 | 0.006165 | 0.003865 | 0.003884 | 0.001895 | 0.004641 | ... | 0.007582 | 0.002822 | 0.002360 | 0.004207 | 0.006930 | 0.003166 | 0.002288 | 0.007681 | 0.005221 | 0.005006 |
| min | 0.000323 | 0.000825 | 0.000212 | 0.001927 | 0.000996 | 0.006698 | 0.001256 | 0.000341 | 0.000086 | 0.000137 | ... | 0.001143 | 0.001898 | 0.000047 | 0.001197 | 0.001187 | 0.000688 | 0.000262 | 0.001440 | 0.000514 | 0.002865 |
| 25% | 0.001326 | 0.000890 | 0.000556 | 0.001927 | 0.001167 | 0.007643 | 0.001513 | 0.000757 | 0.000214 | 0.000654 | ... | 0.001361 | 0.001898 | 0.000486 | 0.002937 | 0.001279 | 0.000688 | 0.000272 | 0.001617 | 0.000997 | 0.002906 |
| 50% | 0.001645 | 0.001693 | 0.000999 | 0.001927 | 0.002077 | 0.007986 | 0.001571 | 0.001676 | 0.000518 | 0.000988 | ... | 0.002150 | 0.001898 | 0.000729 | 0.002937 | 0.001319 | 0.000688 | 0.000534 | 0.001642 | 0.001102 | 0.003357 |
| 75% | 0.003220 | 0.002670 | 0.002189 | 0.003854 | 0.002675 | 0.008587 | 0.002741 | 0.002270 | 0.000855 | 0.001963 | ... | 0.002721 | 0.001898 | 0.001436 | 0.002973 | 0.002598 | 0.002063 | 0.001602 | 0.003209 | 0.002635 | 0.004093 |
| max | 0.977849 | 0.842765 | 0.228543 | 0.932563 | 0.633515 | 0.145985 | 0.364877 | 0.468312 | 0.116361 | 0.443275 | ... | 0.920502 | 0.085389 | 0.252894 | 0.367625 | 0.870774 | 0.161623 | 0.058887 | 0.929939 | 0.509551 | 0.443448 |
8 rows × 60 columns
statistics=df.diff().describe()
statistics
| AEFES | AKBNK | AKSA | AKSEN | ALARK | ALBRK | ANACM | ARCLK | ASELS | ASUZU | ... | TTKOM | TUKAS | TUPRS | USAK | VAKBN | VESTL | YATAS | YKBNK | YUNSA | ZOREN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | ... | 50011.000000 | 50011.000000 | 50011.000000 | 5.001100e+04 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 | 50011.000000 |
| mean | -0.000001 | 0.000005 | 0.000010 | -0.000005 | 0.000010 | 0.000001 | 0.000010 | 0.000010 | 0.000006 | 0.000007 | ... | 0.000004 | 0.000013 | 0.000015 | 8.555974e-08 | 0.000003 | 0.000011 | 0.000009 | 0.000001 | 0.000004 | 0.000003 |
| std | 0.006757 | 0.005733 | 0.002549 | 0.006501 | 0.004625 | 0.007496 | 0.003358 | 0.003520 | 0.001846 | 0.003756 | ... | 0.006516 | 0.002702 | 0.002348 | 3.843752e-03 | 0.005953 | 0.002836 | 0.001974 | 0.006226 | 0.004380 | 0.004523 |
| min | -0.899968 | -0.786785 | -0.172514 | -0.934489 | -0.599767 | -0.128811 | -0.363564 | -0.403092 | -0.159728 | -0.355015 | ... | -0.920502 | -0.079696 | -0.251678 | -3.646880e-01 | -0.836453 | -0.160935 | -0.049791 | -0.851086 | -0.544692 | -0.443448 |
| 25% | -0.001364 | -0.000934 | -0.000589 | -0.001927 | -0.001138 | 0.000000 | -0.001342 | -0.001135 | -0.000415 | -0.000818 | ... | -0.001361 | -0.001898 | -0.000530 | 0.000000e+00 | -0.001293 | -0.000688 | -0.000272 | -0.001617 | -0.000997 | -0.002865 |
| 50% | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 75% | 0.001347 | 0.000934 | 0.000589 | 0.001927 | 0.001138 | 0.000000 | 0.001314 | 0.001135 | 0.000312 | 0.000661 | ... | 0.001361 | 0.000000 | 0.000531 | 0.000000e+00 | 0.001279 | 0.000688 | 0.000262 | 0.001617 | 0.000966 | 0.000000 |
| max | 0.977849 | 0.842765 | 0.228543 | 0.932563 | 0.633515 | 0.145985 | 0.364877 | 0.468312 | 0.116361 | 0.443275 | ... | 0.920502 | 0.085389 | 0.252894 | 3.676252e-01 | 0.870774 | 0.161623 | 0.058887 | 0.929939 | 0.509551 | 0.443448 |
8 rows × 60 columns
data=df.diff()
statistics_negative=data[data<0]
statistics_negative.describe()
| AEFES | AKBNK | AKSA | AKSEN | ALARK | ALBRK | ANACM | ARCLK | ASELS | ASUZU | ... | TTKOM | TUKAS | TUPRS | USAK | VAKBN | VESTL | YATAS | YKBNK | YUNSA | ZOREN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 18360.000000 | 18521.000000 | 18434.000000 | 14764.000000 | 15314.000000 | 11514.000000 | 13450.000000 | 18260.000000 | 18319.000000 | 19401.000000 | ... | 16947.000000 | 12643.000000 | 19459.000000 | 12070.000000 | 17120.000000 | 18266.000000 | 15507.000000 | 15643.000000 | 16053.000000 | 12650.000000 |
| mean | -0.002679 | -0.002216 | -0.001585 | -0.002860 | -0.002372 | -0.009407 | -0.002574 | -0.001927 | -0.000860 | -0.001696 | ... | -0.002664 | -0.002642 | -0.001209 | -0.003707 | -0.002409 | -0.001747 | -0.001248 | -0.002776 | -0.002207 | -0.004111 |
| std | 0.007146 | 0.006094 | 0.002318 | 0.007970 | 0.005230 | 0.005310 | 0.003703 | 0.003392 | 0.002091 | 0.003359 | ... | 0.007377 | 0.002591 | 0.002365 | 0.004020 | 0.006656 | 0.002548 | 0.002069 | 0.007066 | 0.004932 | 0.004718 |
| min | -0.899968 | -0.786785 | -0.172514 | -0.934489 | -0.599767 | -0.128811 | -0.363564 | -0.403092 | -0.159728 | -0.355015 | ... | -0.920502 | -0.079696 | -0.251678 | -0.364688 | -0.836453 | -0.160935 | -0.049791 | -0.851086 | -0.544692 | -0.443448 |
| 25% | -0.003220 | -0.002616 | -0.001984 | -0.003854 | -0.002675 | -0.008587 | -0.002741 | -0.002191 | -0.000853 | -0.001963 | ... | -0.002721 | -0.001898 | -0.001325 | -0.002973 | -0.002598 | -0.002063 | -0.001340 | -0.003209 | -0.002173 | -0.004093 |
| 50% | -0.001642 | -0.001683 | -0.000999 | -0.001927 | -0.001394 | -0.007986 | -0.001542 | -0.001676 | -0.000428 | -0.000962 | ... | -0.002150 | -0.001898 | -0.000718 | -0.002937 | -0.001319 | -0.000688 | -0.000534 | -0.001642 | -0.001081 | -0.003357 |
| 75% | -0.001326 | -0.000890 | -0.000549 | -0.001927 | -0.001167 | -0.007643 | -0.001513 | -0.000757 | -0.000214 | -0.000654 | ... | -0.001361 | -0.001898 | -0.000486 | -0.002937 | -0.001279 | -0.000688 | -0.000272 | -0.001617 | -0.000997 | -0.002906 |
| max | -0.000323 | -0.000825 | -0.000212 | -0.001927 | -0.000996 | -0.003177 | -0.001256 | -0.000344 | -0.000098 | -0.000190 | ... | -0.001143 | -0.001898 | -0.000048 | -0.002937 | -0.001187 | -0.000688 | -0.000262 | -0.001440 | -0.000514 | -0.002865 |
8 rows × 60 columns
statistics.iloc[1,:].idxmax()
'ISYAT'
data.mode()
| AEFES | AKBNK | AKSA | AKSEN | ALARK | ALBRK | ANACM | ARCLK | ASELS | ASUZU | ... | TTKOM | TUKAS | TUPRS | USAK | VAKBN | VESTL | YATAS | YKBNK | YUNSA | ZOREN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 rows × 60 columns
data.median()
AEFES 0.0 AKBNK 0.0 AKSA 0.0 AKSEN 0.0 ALARK 0.0 ALBRK 0.0 ANACM 0.0 ARCLK 0.0 ASELS 0.0 ASUZU 0.0 AYGAZ 0.0 BAGFS 0.0 BANVT 0.0 BRISA 0.0 CCOLA 0.0 CEMAS 0.0 ECILC 0.0 EREGL 0.0 FROTO 0.0 GARAN 0.0 GOODY 0.0 GUBRF 0.0 HALKB 0.0 ICBCT 0.0 ISCTR 0.0 ISDMR 0.0 ISFIN 0.0 ISYAT 0.0 KAREL 0.0 KARSN 0.0 KCHOL 0.0 KRDMB 0.0 KRDMD 0.0 MGROS 0.0 OTKAR 0.0 PARSN 0.0 PETKM 0.0 PGSUS 0.0 PRKME 0.0 SAHOL 0.0 SASA 0.0 SISE 0.0 SKBNK 0.0 SODA 0.0 TCELL 0.0 THYAO 0.0 TKFEN 0.0 TOASO 0.0 TRKCM 0.0 TSKB 0.0 TTKOM 0.0 TUKAS 0.0 TUPRS 0.0 USAK 0.0 VAKBN 0.0 VESTL 0.0 YATAS 0.0 YKBNK 0.0 YUNSA 0.0 ZOREN 0.0 dtype: float64
data.var()
AEFES 0.000046 AKBNK 0.000033 AKSA 0.000006 AKSEN 0.000042 ALARK 0.000021 ALBRK 0.000056 ANACM 0.000011 ARCLK 0.000012 ASELS 0.000003 ASUZU 0.000014 AYGAZ 0.000012 BAGFS 0.000012 BANVT 0.000007 BRISA 0.000014 CCOLA 0.000036 CEMAS 0.000011 ECILC 0.000008 EREGL 0.000004 FROTO 0.000009 GARAN 0.000028 GOODY 0.000073 GUBRF 0.000027 HALKB 0.000041 ICBCT 0.000008 ISCTR 0.000029 ISDMR 0.000006 ISFIN 0.000005 ISYAT 0.000014 KAREL 0.000008 KARSN 0.000020 KCHOL 0.000016 KRDMB 0.000024 KRDMD 0.000008 MGROS 0.000031 OTKAR 0.000013 PARSN 0.000005 PETKM 0.000005 PGSUS 0.000010 PRKME 0.000031 SAHOL 0.000039 SASA 0.000006 SISE 0.000006 SKBNK 0.000043 SODA 0.000003 TCELL 0.000012 THYAO 0.000009 TKFEN 0.000005 TOASO 0.000008 TRKCM 0.000006 TSKB 0.000039 TTKOM 0.000042 TUKAS 0.000007 TUPRS 0.000006 USAK 0.000015 VAKBN 0.000035 VESTL 0.000008 YATAS 0.000004 YKBNK 0.000039 YUNSA 0.000019 ZOREN 0.000020 dtype: float64
data.std()
AEFES 0.006757 AKBNK 0.005733 AKSA 0.002549 AKSEN 0.006501 ALARK 0.004625 ALBRK 0.007496 ANACM 0.003358 ARCLK 0.003520 ASELS 0.001846 ASUZU 0.003756 AYGAZ 0.003406 BAGFS 0.003530 BANVT 0.002713 BRISA 0.003807 CCOLA 0.006014 CEMAS 0.003252 ECILC 0.002779 EREGL 0.002114 FROTO 0.002977 GARAN 0.005280 GOODY 0.008538 GUBRF 0.005215 HALKB 0.006411 ICBCT 0.002740 ISCTR 0.005395 ISDMR 0.002459 ISFIN 0.002244 ISYAT 0.003785 KAREL 0.002781 KARSN 0.004477 KCHOL 0.004030 KRDMB 0.004923 KRDMD 0.002869 MGROS 0.005533 OTKAR 0.003643 PARSN 0.002145 PETKM 0.002228 PGSUS 0.003188 PRKME 0.005564 SAHOL 0.006225 SASA 0.002383 SISE 0.002451 SKBNK 0.006570 SODA 0.001810 TCELL 0.003500 THYAO 0.003082 TKFEN 0.002324 TOASO 0.002876 TRKCM 0.002469 TSKB 0.006262 TTKOM 0.006516 TUKAS 0.002702 TUPRS 0.002348 USAK 0.003844 VAKBN 0.005953 VESTL 0.002836 YATAS 0.001974 YKBNK 0.006226 YUNSA 0.004380 ZOREN 0.004523 dtype: float64
#Skews to right when positive,to left when negative
data.skew()
AEFES 13.342564 AKBNK 11.810276 AKSA 8.425760 AKSEN -0.414508 ALARK 7.871218 ALBRK 0.611683 ANACM 0.473930 ARCLK 17.144157 ASELS -12.528670 ASUZU 17.781964 AYGAZ 29.146739 BAGFS -115.506753 BANVT 2.347752 BRISA 0.132926 CCOLA 6.408829 CEMAS -11.343028 ECILC 6.846357 EREGL 0.318090 FROTO 4.478731 GARAN 10.408979 GOODY -0.005829 GUBRF 0.454105 HALKB 11.204455 ICBCT -10.998863 ISCTR 18.982775 ISDMR 6.174451 ISFIN -15.173323 ISYAT 18.462930 KAREL 0.974498 KARSN 0.494337 KCHOL 9.006304 KRDMB 8.366308 KRDMD 5.092583 MGROS 2.605187 OTKAR 10.392063 PARSN -6.360402 PETKM 0.308251 PGSUS -0.061565 PRKME -0.707580 SAHOL 8.260224 SASA 1.059227 SISE 20.688893 SKBNK 0.236488 SODA 5.524513 TCELL 52.991215 THYAO 0.299846 TKFEN 4.089045 TOASO 14.968348 TRKCM 35.805049 TSKB 0.721176 TTKOM 0.487506 TUKAS 0.195159 TUPRS 0.176209 USAK 0.360454 VAKBN 7.097173 VESTL 1.489574 YATAS 1.009304 YKBNK 15.532139 YUNSA -5.844550 ZOREN -0.067330 dtype: float64
#Narrower bell shape when negative,wider bell when positive
df_dif_num.kurtosis()
AEFES -1.629918 AKBNK -1.652585 AKSA -1.604290 AKSEN -1.264739 ALARK -1.329400 ALBRK -0.811784 ANACM -1.120550 ARCLK -1.618506 ASELS -1.589496 ASUZU -1.615060 AYGAZ -1.568923 BAGFS -1.509655 BANVT -1.495501 BRISA -1.505834 CCOLA -1.709216 CEMAS -0.791281 ECILC -1.198456 EREGL -1.537100 FROTO -1.709151 GARAN -1.660417 GOODY -1.475634 GUBRF -1.416043 HALKB -1.560887 ICBCT -1.072661 ISCTR -1.524183 ISDMR 2.618904 ISFIN -0.607593 ISYAT -0.021914 KAREL -1.261153 KARSN -0.833144 KCHOL -1.655714 KRDMB -1.165188 KRDMD -1.101239 MGROS -1.631327 OTKAR -1.698935 PARSN -1.492878 PETKM -1.393785 PGSUS -1.596284 PRKME -1.285143 SAHOL -1.632533 SASA -1.317361 SISE -1.416988 SKBNK -0.817996 SODA -1.382325 TCELL -1.547758 THYAO -1.645781 TKFEN -1.629206 TOASO -1.644768 TRKCM -1.242317 TSKB -0.817556 TTKOM -1.511603 TUKAS -0.979836 TUPRS -1.713499 USAK -0.886661 VAKBN -1.531930 VESTL -1.546372 YATAS -1.353930 YKBNK -1.395036 YUNSA -1.380651 ZOREN -0.975972 dtype: float64
Correlation analysis: Three interesting stocks were selected(ALBRK,TUKAS,ISYAT).Their correlations will be calculated and days with highest correlation change and highest correlation will be provided for google trends analysis.Instead of 15 minutes data daily high prices were used in order to better grasp the data.
#Correlation of selected stock
from datetime import datetime,date,timedelta
df_daily=df.resample('1d').max()
df_daily=df_daily.dropna()
cor=df_daily['ALBRK'].rolling(30).corr(df_daily['TUKAS'])
print('Highest correlation window',cor.idxmax(),cor.idxmax()+timedelta(days=30))
print('Lowest correlation window',cor.idxmin(),cor.idxmin()+timedelta(days=30))
print('Highest correlation increase',cor.diff().idxmax())
print('Highest correlation decrease',cor.diff().idxmin())
cor.plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('date', fontsize=20)
plt.show()
sns.kdeplot(data=cor).set_title('Correlation distribution')
plt.show()
sns.histplot(data=cor).set_title('Correlation histogram')
Highest correlation window 2013-09-09 00:00:00+00:00 2013-10-09 00:00:00+00:00 Lowest correlation window 2015-07-23 00:00:00+00:00 2015-08-22 00:00:00+00:00 Highest correlation increase 2012-12-05 00:00:00+00:00 Highest correlation decrease 2015-09-18 00:00:00+00:00
Text(0.5, 1.0, 'Correlation histogram')
from datetime import datetime,date,timedelta
df_daily=df.resample('1d').max()
df_daily=df_daily.dropna()
cor=df_daily['ALBRK'].rolling(30).corr(df_daily['ISYAT'])
print('Highest correlation window',cor.idxmax(),cor.idxmax()+timedelta(days=30))
print('Lowest correlation window',cor.idxmin(),cor.idxmin()+timedelta(days=30))
print('Highest correlation increase',cor.diff().idxmax())
print('Highest correlation decrease',cor.diff().idxmin())
cor.plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('date', fontsize=20)
plt.show()
sns.kdeplot(data=cor).set_title('Correlation distribution')
plt.show()
sns.histplot(data=cor).set_title('Correlation histogram')
Highest correlation window 2013-11-12 00:00:00+00:00 2013-12-12 00:00:00+00:00 Lowest correlation window 2014-04-18 00:00:00+00:00 2014-05-18 00:00:00+00:00 Highest correlation increase 2014-05-13 00:00:00+00:00 Highest correlation decrease 2014-03-28 00:00:00+00:00
Text(0.5, 1.0, 'Correlation histogram')
from datetime import datetime,date,timedelta
df_daily=df.resample('1d').max()
df_daily=df_daily.dropna()
cor=df_daily['TUKAS'].rolling(30).corr(df_daily['ISYAT'])
print('Highest correlation window',cor.idxmax(),cor.idxmax()+timedelta(days=30))
print('Lowest correlation window',cor.idxmin(),cor.idxmin()+timedelta(days=30))
print('Highest correlation increase',cor.diff().idxmax())
print('Highest correlation decrease',cor.diff().idxmin())
cor.plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('date', fontsize=20)
plt.show()
sns.kdeplot(data=cor).set_title('Correlation distribution')
plt.show()
sns.histplot(data=cor).set_title('Correlation histogram')
Highest correlation window 2019-02-27 00:00:00+00:00 2019-03-29 00:00:00+00:00 Lowest correlation window 2016-02-02 00:00:00+00:00 2016-03-03 00:00:00+00:00 Highest correlation increase 2012-12-05 00:00:00+00:00 Highest correlation decrease 2017-03-29 00:00:00+00:00
Text(0.5, 1.0, 'Correlation histogram')
Principal Component Analysis: PCA was implemented on the data.Principal comonents and relative variation explained by them are provided.Data points were transformed to the new coordinate system created by PCA.This example is not suitable for PCA analysis.The data in question is not stationary and it is likely to have non-linear relations between variables.PCA rotates the axis of the coordinate system so that it encapsulates the highest variance possible to the first variable and least possible to the last.However if there is stationarity in the data it learns to compress the information on the given segment instead of the whole process.PCA assumes there is linear relationships between variables and consequently non linear relationships are dropped out with the less important axises and this causes loss of information.In this dataset PCA learns the wrong representation due to stationarity and causes loss of information due to non linear relations.This kind of data,Multivariate,unstationary time series with non linear relationships between variables are not suitable to PCA aplications.PCA should not be implemented on it.After getting rid of stationarity CCA or autoencoders may be implemented.Nearly all of the variance seems to be explained by the few latent components.This has two possible explanations.The data in question is random and does not posess valuable information or due to non linearity high variation latent components cannot be highlighted by PCA.
#Principal componenet analysis
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(df.values)
PCA()
plt.hist(pca.explained_variance_ratio_)
x=pca.transform(df.values)#principal components values
print(pca.explained_variance_ratio_)
print(pca.singular_values_)
[6.58863790e-01 1.07485607e-01 8.43247899e-02 3.53499550e-02 2.74745561e-02 1.72708027e-02 1.03153939e-02 8.63891371e-03 6.48636013e-03 5.24644083e-03 4.52317389e-03 4.25063885e-03 3.66752204e-03 2.47334962e-03 2.31441953e-03 2.03373472e-03 1.96175717e-03 1.71469848e-03 1.39712113e-03 1.08722634e-03 1.03570941e-03 9.99550260e-04 8.93125883e-04 7.81297661e-04 7.39177873e-04 6.88579933e-04 6.47041397e-04 6.18726677e-04 5.67825161e-04 5.32537973e-04 4.90834089e-04 4.02587758e-04 3.79261767e-04 3.67149030e-04 3.59495875e-04 3.05299704e-04 2.79702928e-04 2.63909073e-04 2.49743125e-04 2.41852386e-04 2.21878632e-04 1.89886327e-04 1.83216237e-04 1.78608571e-04 1.59028334e-04 1.56201884e-04 1.36731662e-04 1.30746877e-04 1.14151929e-04 1.09685630e-04 1.02515131e-04 9.65884054e-05 8.35535741e-05 7.45783654e-05 7.09292045e-05 6.99109818e-05 6.25604504e-05 5.88397463e-05 4.01747997e-05 3.65543704e-05] [246.0554475 99.38259242 88.0263711 56.99406028 50.24588573 39.83744036 30.78775864 28.17505241 24.41381748 21.95672457 20.38716353 19.76342765 18.3578379 15.07571503 14.58331332 13.67043442 13.42634458 12.55246579 11.33058139 9.99528418 9.75560314 9.58379427 9.05923451 8.47312135 8.24156415 7.9544901 7.71083124 7.54022986 7.22341357 6.99536621 6.71587375 6.08226652 5.90343378 5.80839786 5.74754156 5.29661439 5.06971646 4.92450208 4.79051204 4.7142255 4.51536544 4.17716867 4.10314764 4.05122452 3.82271938 3.78859602 3.54462022 3.46617772 3.23874527 3.17475366 3.06922799 2.9791863 2.77087976 2.61783093 2.55298174 2.53459086 2.3976462 2.32525486 1.92137451 1.83275671]
df_dif_num.head()
| AEFES | AKBNK | AKSA | AKSEN | ALARK | ALBRK | ANACM | ARCLK | ASELS | ASUZU | ... | TTKOM | TUKAS | TUPRS | USAK | VAKBN | VESTL | YATAS | YKBNK | YUNSA | ZOREN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| timestamp | |||||||||||||||||||||
| 2012-09-17 06:45:00+00:00 | 0.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | 0.0 | -1.0 | 0.0 | 1.0 | ... | -1.0 | 0.0 | -1.0 | 0.0 | -1.0 | 0.0 | 1.0 | -1.0 | -1.0 | 0.0 |
| 2012-09-17 07:00:00+00:00 | 0.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | 0.0 | -1.0 | 0.0 | 1.0 | ... | -1.0 | 0.0 | -1.0 | 0.0 | -1.0 | 0.0 | 1.0 | -1.0 | -1.0 | 0.0 |
| 2012-09-17 07:15:00+00:00 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | -1.0 | ... | 0.0 | 1.0 | -1.0 | 1.0 | -1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 2012-09-17 07:30:00+00:00 | 0.0 | -1.0 | 0.0 | 0.0 | -1.0 | -1.0 | 0.0 | -1.0 | 0.0 | 1.0 | ... | 0.0 | 0.0 | 1.0 | -1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| 2012-09-17 07:45:00+00:00 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 0.0 | ... | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | -1.0 | 0.0 | 0.0 | 0.0 | -1.0 |
5 rows × 60 columns
Google Trends: Search volume of the given companies in the highest correlation window will be provided.A day was observed in plots in which there were considerable change,the day in question will be analyzed
google_trend=pd.read_csv("C:/Users/kteke/Downloads/multiTimeline (1).csv", header=1,index_col=0)
# Inspect data
print(google_trend.head())
google_trend.plot()
google_trend.corr()
İş Yatırım: (Türkiye) Albaraka Türk Katılım Bankası: (Türkiye) Gün 2013-09-09 17 77 2013-09-10 19 69 2013-09-11 21 71 2013-09-12 27 57 2013-09-13 28 64
| İş Yatırım: (Türkiye) | Albaraka Türk Katılım Bankası: (Türkiye) | |
|---|---|---|
| İş Yatırım: (Türkiye) | 1.000000 | 0.742652 |
| Albaraka Türk Katılım Bankası: (Türkiye) | 0.742652 | 1.000000 |
As it can be seen, there is a strong correlation between searches in the period where window correlation of the two stocks are highest.
google_trend_2=pd.read_csv("C:/Users/kteke/Downloads/multiTimeline (2).csv", header=1,index_col=0)
# Inspect data
print(google_trend_2.head())
google_trend_2.plot()
google_trend_2.corr()
İş Yatırım: (Türkiye) \
Gün
2019-02-27 64
2019-02-28 81
2019-03-01 45
2019-03-02 27
2019-03-03 23
TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye)
Gün
2019-02-27 0
2019-02-28 19
2019-03-01 0
2019-03-02 0
2019-03-03 0
| İş Yatırım: (Türkiye) | TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye) | |
|---|---|---|
| İş Yatırım: (Türkiye) | 1.000000 | 0.021554 |
| TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye) | 0.021554 | 1.000000 |
Although there is strong correlation in this period for the given stocks,no strong correlation was observed in search volumes.
google_trend_3=pd.read_csv("C:/Users/kteke/Downloads/multiTimeline (3).csv", header=1,index_col=0)
# Inspect data
print(google_trend_3.head())
google_trend_3.plot()
google_trend_3.corr()
Albaraka Türk Katılım Bankası: (Türkiye) \
Gün
2013-09-09 77
2013-09-10 69
2013-09-11 70
2013-09-12 61
2013-09-13 68
TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye)
Gün
2013-09-09 0
2013-09-10 0
2013-09-11 0
2013-09-12 7
2013-09-13 0
| Albaraka Türk Katılım Bankası: (Türkiye) | TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye) | |
|---|---|---|
| Albaraka Türk Katılım Bankası: (Türkiye) | 1.000000 | -0.124751 |
| TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye) | -0.124751 | 1.000000 |
Although there is strong correlation in this period for the given stocks,no strong correlation was observed in search volumes.
Now we will investigate the changes in search volume of words "borsa","yatırım", and "para" in the vicinity of the most volatile day.
df_dif.idxmax()
AEFES 2013-05-07 06:30:00+00:00 AKBNK 2013-05-07 06:30:00+00:00 AKSA 2013-05-07 06:30:00+00:00 AKSEN 2013-05-07 06:30:00+00:00 ALARK 2013-05-07 06:30:00+00:00 ALBRK 2019-02-13 07:00:00+00:00 ANACM 2013-05-07 06:30:00+00:00 ARCLK 2013-05-07 06:30:00+00:00 ASELS 2013-05-07 06:30:00+00:00 ASUZU 2013-05-07 06:30:00+00:00 AYGAZ 2013-05-07 06:30:00+00:00 BAGFS 2013-05-07 06:30:00+00:00 BANVT 2013-05-07 06:30:00+00:00 BRISA 2013-05-07 06:30:00+00:00 CCOLA 2013-05-07 06:30:00+00:00 CEMAS 2013-05-07 06:30:00+00:00 ECILC 2013-05-07 06:30:00+00:00 EREGL 2013-05-07 06:30:00+00:00 FROTO 2013-05-07 06:30:00+00:00 GARAN 2013-05-07 06:30:00+00:00 GOODY 2015-07-15 12:00:00+00:00 GUBRF 2013-12-05 15:30:00+00:00 HALKB 2013-05-07 06:30:00+00:00 ICBCT 2013-05-07 06:30:00+00:00 ISCTR 2013-05-07 06:30:00+00:00 ISDMR 2018-03-30 11:00:00+00:00 ISFIN 2013-05-07 06:30:00+00:00 ISYAT 2013-05-07 06:30:00+00:00 KAREL 2013-05-07 06:30:00+00:00 KARSN 2013-05-07 06:30:00+00:00 KCHOL 2013-05-07 06:30:00+00:00 KRDMB 2013-05-07 06:30:00+00:00 KRDMD 2013-05-07 06:30:00+00:00 MGROS 2013-05-07 06:30:00+00:00 OTKAR 2013-05-07 06:30:00+00:00 PARSN 2013-05-07 06:30:00+00:00 PETKM 2013-05-07 06:30:00+00:00 PGSUS 2013-05-07 06:30:00+00:00 PRKME 2013-05-07 06:30:00+00:00 SAHOL 2013-05-07 06:30:00+00:00 SASA 2018-06-25 06:45:00+00:00 SISE 2013-05-07 06:30:00+00:00 SKBNK 2013-05-07 06:30:00+00:00 SODA 2013-05-07 06:30:00+00:00 TCELL 2013-05-07 06:30:00+00:00 THYAO 2013-05-07 06:30:00+00:00 TKFEN 2013-05-07 06:30:00+00:00 TOASO 2013-05-07 06:30:00+00:00 TRKCM 2013-05-07 06:30:00+00:00 TSKB 2013-05-07 06:30:00+00:00 TTKOM 2013-05-07 06:30:00+00:00 TUKAS 2019-03-04 07:00:00+00:00 TUPRS 2013-05-07 06:30:00+00:00 USAK 2013-05-07 06:30:00+00:00 VAKBN 2013-05-07 06:30:00+00:00 VESTL 2013-05-07 06:30:00+00:00 YATAS 2017-11-20 06:45:00+00:00 YKBNK 2013-05-07 06:30:00+00:00 YUNSA 2013-05-07 06:30:00+00:00 ZOREN 2013-05-07 06:30:00+00:00 dtype: datetime64[ns, UTC]
google_trend_4=pd.read_csv("C:/Users/kteke/Downloads/multiTimeline (4).csv", header=1,index_col=0)
# Inspect data
google_trend_4.plot()
<AxesSubplot:xlabel='Gün'>
Increase in the search volume of the provided words can be observed.This indicates these search terms may be related to significant changes in the stock market.
Final Remarks: Financial data posseses complex non linear relations and when detrended it is likely to follow a random walk.However with data mining tecniques valuable information that is kept hidden in the data may be recovered.The dataset contained lots of nan values filling them might have caused us to uncover non existing patterns or to miss some patterns that could have been discovered.